Search CORE

25 research outputs found

Query-Based Document Skimming: A User-Centred Evaluation of Relevance Profiling

Author: D. Harman
M. Beaulieu
M. Kaszkiel
P. Borlund
T.R.G. Green
W. Hersh
Publication venue: Proceedings of 25-th European Conference on Information Retrieval. Lecture Notes in Computer Science
Publication date: 01/01/2003
Field of study

We present a user-centred, task-oriented, comparative evaluation of two query-based document skimming tools. ProfileSkim bases within-document retrieval on computing a relevance profile for a document and query; FindSkim provides similar functionality to the web browser Find-command. A novel simulated work task was devised, where experiment participants are asked to identify (index) relevant pages of an electronic book, given subjects from the existing book index. This subject index provides the ground truth, against which the indexing results can be compared. Our major hypothesis was confirmed, namely ProfileSkim proved significantly more efficient than Find-Skim, as measured by time for task. Moreover, indexing task effectiveness, measured by typical IR measures, demonstrated that ProfileSkim was better than FindSkim in identifying relevant pages, although not significantly so. The experiments confirm the potential of relevance profiling to improve query-based document skimming, which should prove highly beneficial for users trying to identify relevant information within long documents

Research at Sofia University

Crossref

Abstract Background With the growing availability of full-text articles online, scientists and other consumers of the life sciences literature now have the ability to go beyond searching bibliographic records (title, abstract, metadata) to directly access full-text content. Motivated by this emerging trend, I posed the following question: is searching full text more effective than searching abstracts? This question is answered by comparing text retrieval algorithms on MEDLINE® abstracts, full-text articles, and spans (paragraphs) within full-text articles using data from the TREC 2007 genomics track evaluation. Two retrieval models are examined: <it>bm25 </it>and the ranking algorithm implemented in the open-source Lucene search engine. Results Experiments show that treating an entire article as an indexing unit does not consistently yield higher effectiveness compared to abstract-only search. However, retrieval based on spans, or paragraphs-sized segments of full-text articles, consistently outperforms abstract-only search. Results suggest that highest overall effectiveness may be achieved by combining evidence from spans and full articles. Conclusion Users searching full text are more likely to find relevant articles than searching only abstracts. This finding affirms the value of full text collections for text retrieval and provides a starting point for future work in exploring algorithms that take advantage of rapidly-growing digital archives. Experimental results also highlight the need to develop distributed text retrieval algorithms, since full-text articles are significantly longer than abstracts and may require the computational resources of multiple machines in a cluster. The MapReduce programming model provides a convenient framework for organizing such computations.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Repository at the University of Maryland

Extraction of Coherent Relevant Passages using Hidden Markov Models

Author: Allan J.
Callan J. P.
Chengxiang Zhai
Clarke C. L. A.
Cormack G. V.
Corrada-Emmanuel A.
Denoyer L.
Freitag D.
Fung P.
He D.
Hearst M. A.
Jiang J.
Jing Jiang
Kaszkiel M.
Knaus D.
Lavrenko V.
Liu X.
Mittendorf E.
Rabiner L. R.
Salton G.
Tellex S.
Zajic D.
Zhai C.
Zhai C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/07/2006
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Humans optional? Automatic large-scale test collections for entity, passage, and entity-passage retrieval

Author: A Kembhavi
B Dalvi
C Boston
C Shah
C Wade
C Xiong
C Xiong
C Xiong
D Bodoff
E Choi
E Gabrilovich
E Yilmaz
EM Voorhees
G Demartini
GK Jayasinghe
GV Cormack
H Bast
H Bast
H Bota
H Raviv
H Zhang
I Soboroff
J Allan
J Dalton
J Dalton
J Foley
J Kamps
J Kamps
J O’Connor
J Pennington
JP Callan
JR Frank
K Balog
L Azzopardi
L Dietz
L Dietz
L Dietz
M Kaszkiel
M Schuhmacher
N Asadi
O Alonso
P Arvola
P Ferragina
PN Mendes
R Berendsen
R Berendsen
R Blanco
R Nogueira
S Arnold
S Chatterjee
S MacAvaney
SM Beitzel
T Sakai
U Sawant
X Wan
Y Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2020
Field of study

Manually creating test collections is a time-, effort-, and cost-intensive process. This paper describes a fully automatic alternative for deriving large-scale test collections, where no human assessments are needed. The empirical experiments confirm that automatic test collection and manual assessments agree on the best performing systems. The collection includes relevance judgments for both text passages and knowledge base entities. Since test collections with relevance data for both entity and text passages are rare, this approach provides a cost-efficient way for training and evaluating ad hoc passage retrieval, entity retrieval, and entity-aware text retrieval methods

Crossref

Enlighten